Labrador
TAXI: Evaluating Categorical Knowledge Editing for Language Models
Powell, Derek, Gerych, Walter, Hartvigsen, Thomas
Humans rarely learn one fact in isolation. Instead, learning a new fact induces knowledge of other facts about the world. For example, in learning a korat is a type of cat, you also infer it is a mammal and has claws, ensuring your model of the world is consistent. Knowledge editing aims to inject new facts into language models to improve their factuality, but current benchmarks fail to evaluate consistency, which is critical to ensure efficient, accurate, and generalizable edits. We manually create TAXI, a new benchmark dataset specifically created to evaluate consistency in categorical knowledge edits. TAXI contains 11,120 multiple-choice queries for 976 edits spanning 41 categories (e.g., Dogs), 164 subjects (e.g., Labrador), and 183 properties (e.g., is a mammal). We then use TAXI to evaluate popular editors' categorical consistency, measuring how often editing a subject's category appropriately edits its properties. We find that 1) the editors achieve marginal, yet non-random consistency, 2) their consistency far underperforms human baselines, and 3) consistency is more achievable when editing atypical subjects Our code and data are available at https://github.com/derekpowell/taxi.
Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models
Luo, Yi, Lin, Zhenghao, Zhang, Yuhao, Sun, Jiashuo, Lin, Chen, Xu, Chengjin, Su, Xiangdong, Shen, Yelong, Guo, Jian, Gong, Yeyun
Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-Align, a two-stage approach. Initially, a safety-trained model identifies potential risks and formulates specific guidelines for various inputs, establishing a comprehensive library of guidelines and a model for input-guidelines retrieval. Subsequently, the retrieval model correlates new inputs with relevant guidelines, which guide LLMs in response generation to ensure safe and high-quality outputs, thereby aligning with human values. An additional optional stage involves fine-tuning a model with well-aligned datasets generated through the process implemented in the second stage. Our method customizes guidelines to accommodate diverse inputs, thereby enhancing the fine-grainedness and comprehensiveness of the guideline library. Furthermore, it incorporates safety expertise from a safety-trained LLM through a lightweight retrieval model. We evaluate our approach on three benchmarks, demonstrating significant improvements in LLM security and quality. Notably, our fine-tuned model, Labrador, even at 13 billion parameters, outperforms GPT-3.5-turbo and surpasses GPT-4 in alignment capabilities.
Detection of Spider Mites on Labrador Beans through Machine Learning Approaches Using Custom Datasets
Liu, Violet, Chen, Jason, Qureshi, Ans, Nejati, Mahla
Amidst growing food production demands, early plant disease detection is essential to safeguard crops; this study proposes a visual machine learning approach for plant disease detection, harnessing RGB and NIR data collected in real-world conditions through a JAI FS-1600D-10GE camera to build an RGBN dataset. A two-stage early plant disease detection model with YOLOv8 and a sequential CNN was used to train on a dataset with partial labels, which showed a 3.6% increase in mAP compared to a single-stage end-to-end segmentation model. The sequential CNN model achieved 90.62% validation accuracy utilising RGBN data. An average of 6.25% validation accuracy increase is found using RGBN in classification compared to RGB using ResNet15 and the sequential CNN models. Further research and dataset improvements are needed to meet food production demands.
Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data
Bellamy, David R., Kumar, Bhawesh, Wang, Cindy, Beam, Andrew
Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations. In recent years, self-supervised pre-training of masked language models (MLMs) (see Appendix A for background) has demonstrated remarkable success across a wide range of machine learning problems and has led to significant downstream improvements across diverse tasks in natural language processing (Liu et al., 2019; Devlin et al., 2019; Raffel et al., 2020). There is considerable excitement surrounding the potential of large pre-trained MLMs to achieve similar success in medical applications. For instance, existing applications of MLMs in medicine have already yielded promising results in tasks related to medical text understanding (Lee et al., 2020; Alsentzer et al., 2019; Huang et al., 2019; Yang et al., 2019; Beltagy et al., 2019). Laboratory data is abundant, routinely collected, less biased compared to other types of data in electronic health records (EHRs) like billing codes (Beam et al., 2021), and directly measure a patient's physiological state, offering a valuable opportunity for creating a medical foundation model. However, there is a large body of evidence showing that deep learning is consistently outperformed on so-called "tabular" data prediction tasks by traditional machine learning techniques like random forests, XGBoost, and even simple regression models (Bellamy et al., 2020; Finlayson et al., 2023; Sharma, 2013). The reasons for this are only partially understood, but previous work (Grinsztajn et al., 2022) has suggested that this phenomenon may be caused by a rotational invariance in deep learning models that is harmful for tabular data. More broadly, the success of deep learning is thought to be largely due to inductive biases that can be leveraged for images, text, and graphs. These inductive biases are absent or only weakly present in tabular data. Conversely, tree-based methods are scale invariant and robust to uninformative features. We evaluated both models on several downstream outcome prediction tasks and validated the success of pre-training with a set of intrinsic evaluations.
Learning Terrain-Adaptive Locomotion with Agile Behaviors by Imitating Animals
Li, Tingguang, Zhang, Yizheng, Zhang, Chong, Zhu, Qingxu, sheng, Jiapeng, Chi, Wanchao, Zhou, Cheng, Han, Lei
In this paper, we present a general learning framework for controlling a quadruped robot that can mimic the behavior of real animals and traverse challenging terrains. Our method consists of two steps: an imitation learning step to learn from motions of real animals, and a terrain adaptation step to enable generalization to unseen terrains. We capture motions from a Labrador on various terrains to facilitate terrain adaptive locomotion. Our experiments demonstrate that our policy can traverse various terrains and produce a natural-looking behavior. We deployed our method on the real quadruped robot Max via zero-shot simulation-to-reality transfer, achieving a speed of 1.1 m/s on stairs climbing.
Consumer Robotics Show โ TechCrunch
CES has always been a weird show for robotics. It's true the organization behind the show dropped the name "Consumer Electronics Show" some number of years ago (a fact it continues to be very insistent about in its press materials), but at its heart the show is still very much about consumer technologies. For robotics, consumer has been an exceedingly difficult nut to crack, for reasons of pricing, scalability and the general unpredictability of operating in uncontrolled environments. In much the same way that the robotic vacuum has long been the main exception to that rule, robotic vacuums have been the one consistent feature at the show over the past decade-plus. Back in 2020 (the last time TechCrunch attended the show in person), I wrote a piece titled, "Companies take baby steps toward home robots at CES." Fittingly (for reasons that will be made clear below), the first person I quoted in the piece was Labrador Systems co-founder/CEO Mike Dooley, who told me, "I think there are fewer fake robots this year."
Labrador "Retriever" robot for those with chronic pain
A new personal robot takes a page from the massive proliferation of enterprise materials handling robots over the past few years. The Retriever from developer Labrador Systems is like a sleek version of the autonomous mobile robots (AMRs) that are now commonplace in logistics centers and manufacturing. After several years of, frankly, ridiculous personal robotic humanoids and mobile robot assistants that amount to very expensive Alexas on wheels, this kind of practical consumer unit is a welcome evolution of existing successful commercial platforms. To put a finer point on it, here's a robot that some users might actually have a use for. "There's a significant portion of our society that's massively underserved," says Labrador Systems CEO Mike Dooley.
The breeder of the world's first Labradoodle warns over-breeding threatens to turn into a 'monster
One of the most coveted and recognizable dogs, the labradoodle, may actually be a'monster,' says the breed's progenitor. According to Wally Conron, an Australia native who was the first person to breed the labradoodle - a cross between a poodle and a labrador - the dog opened up a'Pandora's Box.' 'I bred the labradoodle for a blind lady whose husband was allergic to dog hair,' Conron told Australia Broadcast Network. 'She wanted to know if we could come up with a dog that she could use as a guide dog and her husband wouldn't be allergic to.' The issue, says Conron, who was working for the Royal Guide Dogs Association of Australia at the time, wasn't in finding a breed less harsh on one's allergies, it was finding one that was hypoallergenic and had the right temperament. Poodles, though they met the shedding criteria, didn't quite have the same friendliness factor as labradors, so Conron decided to mix the two.
A new tool helps us understand what an AI is actually thinking
Google researchers developed a way to peer inside the minds of deep-learning systems, and the results are delightfully weird. What they did: The team built a tool that combines several techniques to provide people with a clearer idea of how neural networks make decisions. Applied to image classification, it lets a person visualize how the network develops its understanding of what is, for instance, a kitten or a Labrador. The visualizations, above, are ... strange. Why it matters: Deep learning is powerful--but opaque.